Subgroup analyses

  • Generalizing overall treatment effects is often problematic
  • Subgroup analyses rarely adequately powered

Subgroup analyses

Subgroup analyses can be divided into 4 categories (Varadhan et al, 2013):

  • Confirmatory heterogeneity of treatment effect analysis
  • Exploratory heterogeneity of treatment effect analysis
  • Descriptive heterogeneity of treatment effect analysis
  • Predictive heterogeneity of treatment effect analysis

Predictive HTE methods

Risk modeling

  • A multivariate regression model \(f\) that predicts the risk of an outcome \(y\) based on the predictors \(x_1\dots x_p\) is identified or developed.
  • The expected outcome of a patient receiving treatment \(T\) (where \(T = 1\), when patient is treated and \(0\) otherwise) based on the linear predictor \[lp(x_1,\dots x_p) = a + \beta_1x_1 +\dots\beta_px_p\] from a previously derived risk model can be described as \[E\{y|x_1,\dots,x_p\} = f(lp + \gamma_0T+\gamma T\times lp)\]

Risk-based HTE

Reasoning

  • When risk is described through a combination of factors the control event rate will typically vary considerably across the trial population.
  • The absolute risk difference will generally vary across risk strata even if the relative risk is the same
  • When a trial population has substantial variation in outcome risk, important differences often exist in harm–benefit tradeoffs

Individualized approaches

  • Stratification approach may not provide adequate prediction of benefit
  • “Jumps” at cut-offs are not realistic
  • Implement a risk-based smoothing approach

Setting

For all patients we observe covariates \(x_1,\dots,x_8\), of which \(4\) are continuous and \(4\) are binary. More specifically,

\[x_1,\dots,x_4 \sim N(0, 1)\] \[x_5,\dots,x_8 \sim B(1, 0.2)\]

Generate the binary outcomes \(y\) for the untreated patients (\(t_x=0\)), based on

\[P(y\:\vert\:x, t_x=0) = g(\beta_0 + \beta_1x_1+\dots+\beta_8x_8) = g(lp_0),\]

where \[g(x) = \frac{e^x}{1+e^x}\]

Setting

For treated patients, outcomes are generated from:

\[P(y\:\vert\:x, t_x=1) = g(lp_1)\]

where \[lp_1 = \gamma_2(lp_0-c)^2+\gamma_1(lp_0-c)+\gamma_0\]

Setting

Figure 1: Linear and quadratic deviations from the base-case scenario of constant relative effect (OR=0.8)Figure 1: Linear and quadratic deviations from the base-case scenario of constant relative effect (OR=0.8)

Figure 1: Linear and quadratic deviations from the base-case scenario of constant relative effect (OR=0.8)

Setting

Figure 2: Linear and quadratic deviations from the base-case scenario of constant relative effect (OR=0.8)Figure 2: Linear and quadratic deviations from the base-case scenario of constant relative effect (OR=0.8)

Figure 2: Linear and quadratic deviations from the base-case scenario of constant relative effect (OR=0.8)

Setting

Base-case scenario

  • Constant odds ratio: 0.8
  • Sample size: 4250
  • Outcome incidence, if left untreated: 20%
  • Prediction model “true” AUC: 0.75

Deviations

  • Sample size: 1064, 17000
  • Overall treatment effect: 0.5, 1
  • Prediction performance: 0.65, 0.85

Setting

Finally, we consider 3 additional scenarios of interaction of individual covariates with treatment. These scenarios include:

  • 4 weak interactions: \(\text{OR}_{t_x=1} / \text{OR}_{t_x=0}=0.82\)
  • 4 strong interactions: \(\text{OR}_{t_x=1} / \text{OR}_{t_x=0}=0.61\)
  • 2 weak and 2 strong interactions

Methods

Constant treatment effect \[E\{y\:\vert\:x,t_x\} = P(y\:\vert\:x, t_x) = g(\beta_0+\beta_1x_1+\dots+\beta_8x_8+\gamma t_x)\] Absolute benefit is estimated from: \[\hat{f}_{\text{benefit}}(lp\:\vert\:x, \hat{\beta}) = g(lp) - g(lp+\hat{\gamma}) \]

Methods

Linear interaction \[E\{y\:\vert\:x, t_x, \hat{\beta}\} = g\big(lp+(\delta_0+\delta_1lp)t_x\big)\]

Predict absolute benfit from: \[\hat{f}_{\text{benefit}}(lp\:\vert\:x, \hat{\beta}) = g(lp) - g\big(\delta_0+(1+\delta_1)lp\big)\]

Methods

Non-linear interaction \[f_{\text{benefit}}(lp\:\vert\:x,\hat{\beta}) = \hat{f}_{\text{smooth}}(lp\:\vert\:x, \hat{\beta}, t_x=0) - \hat{f}_{\text{smooth}}(lp\:\vert\:x, \hat{\beta}, t_x=1)\] We use restricted cubic spline smoothing wit 3, 4 and 5 knots.

Methods

Adaptive approach

Adaptive approach using AIC. Candidate models are:

  • Constant treatment effect model
  • linear interaction model
  • RCS models (3, 4 and 5 knots)

Evaluation

Root mean squared error

Assuming that \(\tau(x)=E\{y\:\vert\:x, t_x=0\} - E\{y\:\vert\:x,t_x=1\}\) is the true benefit for each patient and \(\hat{\tau}(x)\) is the estimated benefit from a method under study, the ideal loss function to use for the considered methods would be the unobservable root mean squared error \(E\big\{(\hat{\tau} - \tau)^2\:\vert\:x\big\}\).

We will estimate the RMSE from \[\text{RMSE}=\frac{1}{n}\sum_{i=1}^n\big(\tau(x_i) - \hat{\tau}(x_i)\big)^2\]

Evaluation

C-for-benefit

  • Match patients on predicted benefit
  • Use differences of observed outcomes within pairs as “observed benefit”
  • Proceed with calculation of the concordance statistic

Calibration for benefit

  • Match patients on predicted benefit
  • Use differences of observed outcomes within pairs as “observed benefit”
  • Fit smooth calibration curve
  • Calculate the integrated calibration index

Simulations

  • Total of 66 scenarios
  • Replicate each scenario 1000 times
  • For each scenario, evaluations are made on a super-population of 500,000

Results: RMSE

Figure 3: RMSE of the considered methods across 500 replications calculated in a simulated superpopulation of size 500,000.

Figure 3: RMSE of the considered methods across 500 replications calculated in a simulated superpopulation of size 500,000.

Results: RMSE

Figure 4: RMSE of the considered methods across 500 replications calculated in a simulated sample of size 500,000. Sample size 17,000 rather than 4250 in Figure 3.

Figure 4: RMSE of the considered methods across 500 replications calculated in a simulated sample of size 500,000. Sample size 17,000 rather than 4250 in Figure 3.

Results: RMSE

Figure 5: RMSE of the considered methods across 500 replications calculated in a simulated sample of size 500,000. AUC is 0.85 instead of 0.75 in Figure 3.

Figure 5: RMSE of the considered methods across 500 replications calculated in a simulated sample of size 500,000. AUC is 0.85 instead of 0.75 in Figure 3.

Results: Discrimination for benefit

Figure 6: Discrimination for benefit of the considered methods across 500 replications of the base case scenario calculated in a simulated superpopulation of size 500,000.

Figure 6: Discrimination for benefit of the considered methods across 500 replications of the base case scenario calculated in a simulated superpopulation of size 500,000.

Results: Calibration for benefit

Figure 7: Calibration for benefit of the considered methods across 500 replications of the base case scenario calculated in a simulated superpopulation of size 500,000.

Figure 7: Calibration for benefit of the considered methods across 500 replications of the base case scenario calculated in a simulated superpopulation of size 500,000.

Results: Case study

Data from GUSTO-I trial

  • Patients with acute myocardial infarction were randomized to:

    • tissue plasminogen activator (10,348)
    • streptokinase (10,348)
  • Outcome: 30-day mortality

Results: Case study

In line with previous analyses (Califf et al, 1997 and Steyerberg et al) we fitted a logistic regression model for the prediction of 30-day mortality using:

  • age
  • Killip class
  • systolic blood pressure
  • heart rate
  • indicator of previous myocardial infarction (binary)
  • location of myocardial infarction

Results: Case study

Figure 8: Calibration for benefit of the considered methods across 500 replications of the base case scenario calculated in a simulated superpopulation of size 500,000.

Figure 8: Calibration for benefit of the considered methods across 500 replications of the base case scenario calculated in a simulated superpopulation of size 500,000.

Conclusions

  • Linear interaction models had adequate performance in the majority of the scenarios considered
  • Flexible models required larger sample sizes and higher AUC to outperform the linear interaction model
  • The adaptive approach tended to select simpler models with smaller sample sizes and/or worse prediction performance.

Appendix A: Integrated calibration index

  • Harrell: \[\text{E}_\text{max}(a, b) = \text{max}_{a \leq \hat{P} \leq b} |\hat{P} - \hat{P}_c |\]

  • Austin et al: \[\text{ICI} = \int_0^1f(x)\phi(x)dx\]